GH-49956: [GLib] Add fallback data type for unknown extension data type#49969
Conversation
|
|
There was a problem hiding this comment.
Pull request overview
Adds a concrete GLib wrapper type to represent Arrow extension types that aren’t recognized by the GLib bindings, avoiding failures when converting schemas/types originating from C++.
Changes:
- Introduces
GArrowUnknownExtensionDataType(a concrete subclass ofGArrowExtensionDataType) to serve as a fallback wrapper. - Updates C++→GLib type conversion to return
GArrowUnknownExtensionDataTypefor unknown extensions, while still mapping known canonical extensions (UUID and fixed-shape tensor) to their dedicated GLib types. - Extends Ruby tests to verify UUID and fixed-shape tensor extension types round-trip through schema conversion.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| c_glib/test/test-uuid-data-type.rb | Adds a round-trip schema conversion test for UUID extension types. |
| c_glib/test/test-fixed-shape-tensor-data-type.rb | Adds a round-trip schema conversion test for fixed-shape tensor extension types. |
| c_glib/arrow-glib/basic-data-type.h | Declares the new GArrowUnknownExtensionDataType public GLib type. |
| c_glib/arrow-glib/basic-data-type.cpp | Defines GArrowUnknownExtensionDataType and updates extension-type wrapping logic in garrow_data_type_new_raw(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| g_object_ref(garrow_data_type); | ||
| return GARROW_DATA_TYPE(garrow_data_type); | ||
| auto arrow_extension_data_type = | ||
| std::dynamic_pointer_cast<arrow::ExtensionType>(*arrow_data_type); |
| case arrow::Type::type::EXTENSION: | ||
| type = GARROW_TYPE_UNKNOWN_EXTENSION_DATA_TYPE; | ||
| { | ||
| auto g_extension_data_type = | ||
| std::static_pointer_cast<garrow::GExtensionType>(*arrow_data_type); | ||
| if (g_extension_data_type) { | ||
| auto garrow_data_type = g_extension_data_type->garrow_data_type(); | ||
| g_object_ref(garrow_data_type); | ||
| return GARROW_DATA_TYPE(garrow_data_type); | ||
| auto arrow_extension_data_type = | ||
| std::static_pointer_cast<arrow::ExtensionType>(*arrow_data_type); | ||
| auto name = arrow_extension_data_type->extension_name(); | ||
| if (name == "arrow.fixed_shape_tensor") { | ||
| type = GARROW_TYPE_FIXED_SHAPE_TENSOR_DATA_TYPE; | ||
| } else if (name == "arrow.uuid") { | ||
| type = GARROW_TYPE_UUID_DATA_TYPE; | ||
| } else { | ||
| auto g_extension_data_type = | ||
| std::dynamic_pointer_cast<garrow::GExtensionType>(*arrow_data_type); | ||
| if (g_extension_data_type) { | ||
| auto garrow_data_type = g_extension_data_type->garrow_data_type(); | ||
| g_object_ref(garrow_data_type); | ||
| return GARROW_DATA_TYPE(garrow_data_type); | ||
| } | ||
| } |
| G_DEFINE_TYPE(GArrowUnknownExtensionDataType, | ||
| garrow_unknown_extension_data_type, | ||
| GARROW_TYPE_EXTENSION_DATA_TYPE) | ||
|
|
||
| static void | ||
| garrow_unknown_extension_data_type_init(GArrowUnknownExtensionDataType *object) | ||
| { | ||
| } | ||
|
|
||
| static void | ||
| garrow_unknown_extension_data_type_class_init(GArrowUnknownExtensionDataTypeClass *klass) | ||
| { | ||
| } |
| case arrow::Type::type::EXTENSION: | ||
| type = GARROW_TYPE_UNKNOWN_EXTENSION_DATA_TYPE; | ||
| { | ||
| auto g_extension_data_type = | ||
| std::static_pointer_cast<garrow::GExtensionType>(*arrow_data_type); | ||
| if (g_extension_data_type) { | ||
| auto garrow_data_type = g_extension_data_type->garrow_data_type(); | ||
| g_object_ref(garrow_data_type); | ||
| return GARROW_DATA_TYPE(garrow_data_type); | ||
| auto arrow_extension_data_type = | ||
| std::static_pointer_cast<arrow::ExtensionType>(*arrow_data_type); | ||
| auto name = arrow_extension_data_type->extension_name(); | ||
| if (name == "arrow.fixed_shape_tensor") { | ||
| type = GARROW_TYPE_FIXED_SHAPE_TENSOR_DATA_TYPE; | ||
| } else if (name == "arrow.uuid") { | ||
| type = GARROW_TYPE_UUID_DATA_TYPE; | ||
| } else { |
…ata type Users can define any extension data type. So we may encounter an unknown extension type. We can't use `GArrowExtensionDataType` for an unknown extension data type because `GArrowExtensionDataType` is an abstract class. This adds `GArrowUnknownExtensionDataType` and uses it for all unknown extension data types.
|
+1 |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit c103668. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. It also includes information about 141 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Users can define any extension data type. So we may encounter an unknown extension type. We can't use
GArrowExtensionDataTypefor an unknown extension data type becauseGArrowExtensionDataTypeis an abstract class.What changes are included in this PR?
This adds
GArrowUnknownExtensionDataTypeand uses it for all unknown extension data types.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.